On the bayes-optimality of F-measure maximizers

نویسندگان

  • Willem Waegeman
  • Krzysztof Dembczynski
  • Arkadiusz Jachnik
  • Weiwei Cheng
  • Eyke Hüllermeier
چکیده

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayesoptimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Degree of Optimality as a Measure of Distance of Power System Operation from Optimal Operation

This paper presents an algorithm based on inter-solutions of having scheduled electricity generation resources and the fuzzy logic as a sublimation tool of outcomes obtained from the schedule inter-solutions. The goal of the algorithm is to bridge the conflicts between minimal cost and other aspects of generation. In the past, the optimal scheduling of electricity generation resources has been ...

متن کامل

Uniqueness and Characterization of the Maximizers of Integral Functionals with Constraints

Optimization problems appear in economics, mathematics, to name only a few areas. They refer to minimizing, respectively maximizing a certain functional under several constraints. These problems are also connected with rearrangements. Often times the minimizers, or maximizers satisfy certain symmetry and monotonicity conditions, i.e. increasing or decreasing. For (X,μ) a measure space, and F a ...

متن کامل

Optimality conditions for maximizers of the information divergence from an exponential family

The information divergence of a probability measure P from an exponential family E over a nite set is de ned as in mum of the divergences of P from Q subject to Q ∈ E . All directional derivatives of the divergence from E are explicitly found. To this end, behaviour of the conjugate of a log-Laplace transform on the boundary of its domain is analysed. The rst order conditions for P to be a maxi...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2014